• 1. Introduction
  • 2. Apple
    • 2.1 Data visualisation
      • 2.1.1 Average prices per month over the 12 years
      • 2.1.2 Daily prices over one month
      • 2.1.3 Comparison of average prices per month at opening over 12 years and prices at opening over 1 month
    • 2.2 Removing the trend and seasonality
      • 2.2.1 Moving Average Filter
      • 2.2.2 Differencing
    • 2.3 ARMA models
      • 2.3.1 Forecasting APPL stock price with an ARMA model
    • 2.4 GARCH models
      • 2.4.1 Forecasting Apple stock prices with the standard GARCH model
      • 2.4.2 Modified GARCH, an example: EGARCH
  • 3 References

1. Introduction

Our dataset consists of historical stock prices over the last 12 years from Apple (APPL). Our data comes from Kaggle (https://www.kaggle.com/szrlee/stock-time-series-20050101-to-20171231). The variables are date (Date), price at market open (Open), highest price of the day (High), lowest price of the day (Low), price at market close (Close), number of shares bought and sold (Volume), name of the stock (Name).

The goal of this project is to become familiar with the theory of time series and make predictions with a time series. In this project, we explore trend and seasonality elimination methods, ARMA models and GARCH models for predicting a financial time series

2. Apple

2.1 Data visualisation

Having the daily stock prices of Apple over 12 years, we can study our data at several levels of aggregation. In our study we will consider two levels:

1 - The average prices per month over the 12 years

2 - Daily prices over one month

2.1.1 Average prices per month over the 12 years

0501001500501001500501001500501001502006-01-312006-05-312006-09-292007-01-312007-05-312007-09-282008-01-312008-05-302008-09-302009-01-302009-05-292009-09-302010-01-292010-05-282010-09-302011-01-312011-05-312011-09-302012-01-312012-05-312012-09-282013-01-312013-05-312013-09-302014-01-312014-05-302014-09-302015-01-302015-05-292015-09-302016-01-292016-05-312016-09-302017-01-312017-05-312017-09-290100M200M300M400M
OpenHighLowCloseVolumeEvolution of the prices and volume of the shares as a function of time

2.1.2 Daily prices over one month

2017-12-012017-12-042017-12-052017-12-062017-12-072017-12-082017-12-112017-12-122017-12-132017-12-142017-12-152017-12-182017-12-192017-12-202017-12-212017-12-222017-12-262017-12-272017-12-282017-12-29166168170172174176
Close prices of Apple stock in December 2017DateDollars

2.1.3 Comparison of average prices per month at opening over 12 years and prices at opening over 1 month

050100150168170172174
12 ans1 moisLong run vs short run evolution of prices

Looking at the auto-correlation functions of the data, we see that the series cannot be reasonably modeled by a stationary series. The ACF values are all outside the confidence interval, which does not correspond to the ACF of a stationary series.

We notice that the series does not behave in the same way depending on whether we look at the 12-year average or the daily data over one month. For level 1, the data seem to grow linearly. They can potentially be approximated by a linear regression of degree 1. Level 2 seems to have a more periodic behavior. We will now remove the trend and seasonality from the data.

We want to know if the residuals can be modeled by a stationary series.

2.2 Removing the trend and seasonality

In the previous section we analyzed the pattern of the data. We have seen that the data at level 1 grow in a rather linear way. And the data at level 2 have a rather periodic behavior. We explore the possibility of representing the data as a realization of the Classical Decomposition Model.

(1)Xt=mt+st+Yt

where mt represents the trend, st the seasonality with a known period and Yt a stationary random noise We will test two methods of estimating and eliminating the trend and seasonality.

2.2.1 Moving Average Filter

In this section, we will use a moving average to estimate the trend.

$$

(2)Wt=(2q+1)1j=qqXtj
$$

where qN. Using (1) we get an estimate of mt.

We will take the log of the data because we use a linear model.

2006-01-312006-06-302006-11-302007-04-302007-09-282008-02-292008-07-312008-12-312009-05-292009-10-302010-03-312010-08-312011-01-312011-06-302011-11-302012-04-302012-09-282013-02-282013-07-312013-12-312014-05-302014-10-312015-03-312015-08-312016-01-292016-06-302016-11-302017-04-282017-09-2922.533.544.55
pricesMoving average
2006-01-312006-05-312006-09-292007-01-312007-05-312007-09-282008-01-312008-05-302008-09-302009-01-302009-05-292009-09-302010-01-292010-05-282010-09-302011-01-312011-05-312011-09-302012-01-312012-05-312012-09-282013-01-312013-05-312013-09-302014-01-312014-05-302014-09-302015-01-302015-05-292015-09-302016-01-292016-05-312016-09-302017-01-312017-05-312017-09-29−0.3−0.2−0.100.1
Residuals obtained from the Moving Average filter

After computing the residuals Y^t=Xtm^t, we obtain the following autocorrelation function for 40 lags. We see that most of the values after 2 lags remain in the confidence interval.

Unit Root test to test stationarity of the differentiated series

We can apply a unit root test (or Dickey-Fuller test) which allows us to know if the obtained residuals are stationary or not.

The p-value of the test is 0.01, which means that we can reject the null hypothesis at a 95% confidence level. Therefore, the residuals obtained by MA of the series can be modeled by a stationary series.

2.2.2 Differencing

In this method, we introduce the lag-d operator: dXt=XtXtd. By applying the lag-d operator to equation (1), we have:

dXt=mtmtd+YtYtd

We have the decomposition of the difference dXt as a function of trend and noise. From there we can eliminate the trend by applying the operator : jXt=(1B)jXt. We use the diff function from R to apply this method.

23452006-01-312006-05-312006-09-292007-01-312007-05-312007-09-282008-01-312008-05-302008-09-302009-01-302009-05-292009-09-302010-01-292010-05-282010-09-302011-01-312011-05-312011-09-302012-01-312012-05-312012-09-282013-01-312013-05-312013-09-302014-01-312014-05-302014-09-302015-01-302015-05-292015-09-302016-01-292016-05-312016-09-302017-01-312017-05-312017-09-29−0.200.2
PricesDifferentiatedlog of prices and trend-less series

We plot the auto-correlation function of the differentiated series.

Several values of the ACF are still not in the interval. Then the residuals are not IID.

Unit Root test to test stationarity of the differentiated series

We run a Unit Root test and we obtain a p-value of 0.01, which means that we can reject the null hypothesis at the 95% confidence level. Therefore, the residuals obtained by differentiation of the series can be modeled by a stationary series.

In conclusion we have made our time series stationary thanks to the methods by differentiation and by Moving Average.

2.3 ARMA models

In this section we want to determine if we can find an ARMA model that models our data reasonably well.

ARMA (Autoregressive Moving Average) models explain the relationship of the data series with random noise (the Moving Average part) and with its prior values (the Autoregressive part).

Mathematically:

Xtis an ARMA(p,q) process if Xtis stationary and if for all tXtϕ1Xt1...ϕpXtp=Ztθ1Zt1...θqZtqwhere (Zt) is a white noise

Using R, we manage to find the ARMA model that best fits our series.

XtXt11.10=Zt2,87.101Zt1

It turns out to be an ARIMA(0,1,1) model, i.e. a model such that the series differentiated once from our data is an ARMA(0,1).

This is consistent with our results above because we have seen that by differentiating the series once we had a stationary series. And this series is well modeled by an MA(1) process.

Using R for the daily price data over a month, we see that these data are better modeled by an ARMA(1,0) model.

Xt6,96.101Xt1=Zt

This means that the daily prices seem to be better explained by past realizations while the averages are better explained by noise.

2.3.1 Forecasting APPL stock price with an ARMA model

In this section we will predict the monthly average values of Apple’s stock with an ARIMA(0,1,1) model. We will estimate the error of the prediction through the root mean square error (RMSE).

We first divide our dataset into a training sample and a validation sample (test sample) with the ratio 70/30.

We consider the monthly averages

2006-01-312006-06-302006-11-302007-04-302007-09-282008-02-292008-07-312008-12-312009-05-292009-10-302010-03-312010-08-312011-01-312011-06-302011-11-302012-04-302012-09-282013-02-282013-07-312013-12-312014-05-302014-10-312015-03-312015-08-312016-01-292016-06-302016-11-302017-04-282017-09-29050100150
lower boundupper boundForecasted valuesTest setForecasted values of ARIMA(0,1,1)Average stock price per month in $

The RMSE obtained for this prediction is:

RMSE=33.08

This result is not very satisfactory. In the following we will try to see how we can improve it.

2.4 GARCH models

In this section, we will explore the GARCH models to see if this class of models can make a better prediction of our data. The ARCH and GARCH (Generalized & Autoregressive Conditional Heteroscedasticity) models were developed to reflect the properties of financial time series. These properties include skewness, volatility, and uncorrelated serial dependence. These properties cannot be captured with traditional linear models such as ARMA.

The ARCH and GARCH models are written as follows:

Zt a stationary process such that:

Zt=htetoù (et) est IID Normal(0,1)ht=α0+i=1pαiZti2pour le ARCH(3)ht=α0+i=1pαiZti2+i=1qβihtipour le GARCH

These models are applied on the log returns of stock prices at closing i.e. log(PtPt1) where Pt is the stock price at closing because we notice that log returns tend to be stationary.

We start by transforming our data into log returns.

Log returns visualisation

2006-01-042006-04-122006-07-202006-10-252007-02-052007-05-142007-08-202007-11-262008-03-052008-06-112008-09-172008-12-232009-04-022009-07-102009-10-152010-01-252010-05-042010-08-102010-11-152011-02-232011-06-012011-09-072011-12-132012-03-222012-06-282012-10-042013-01-152013-04-242013-07-312013-11-052014-02-132014-05-222014-08-282014-12-042015-03-162015-06-222015-09-282016-01-052016-04-132016-07-202016-10-252017-02-022017-05-112017-08-172017-11-22−0.2−0.15−0.1−0.0500.050.10.15
Log returns as a function of time

We notice in the graph above that the series does indeed look stationary.

In the ACF function above, we see that the values are very close to 0. While in the ACF of squares, the values are significantly different from 0. This implies that we have a series of log returns where the realizations are uncorrelated but dependent. The ARCH and GARCH models include this dependence with the ht term (volatility).

Log returns distribution

We see that the tails of the distribution of returns are larger than those of the normal distribution (in green). This means that the returns are not normally distributed: we can observe very low returns and other very high returns depending on the day.

Building a GARCH model

We construct a GARCH model that assumes a constant mean of 0, a standard GARCH model of normal returns (equation (3)). We obtain a GARCH(1,1) such that model:

Yt=a+Zt où Zt est un GARCH(1,1) tq: ht=9.106+8.102Zt1+8,96.101ht1

The Ljung-Box statistical test for correlation tells us, at a 95% confidence level that there is no correlation of the data (as noted above with the ACF). This test is an argument for the validity of our GARCH model. On the flip side, Pearson’s test for the “Goodness of fit” rejects the null hypothesis which assumes that the residuals are normal. This means that we have room to improve our model at this level.

2.4.1 Forecasting Apple stock prices with the standard GARCH model

We divide our data set in 2 parts (ratio 70/30), with the training sample and the validation sample.

We apply the GARCH(1,1) model on the training data and from there we make a prediction for the values of the validation sample. As before, we measure our error with the RMSE.

2014-05-292014-07-082014-08-142014-09-232014-10-302014-12-092015-01-202015-02-272015-04-082015-05-152015-06-242015-08-032015-09-102015-10-192015-11-252016-01-062016-02-162016-03-242016-05-032016-06-102016-07-202016-08-262016-10-052016-11-112016-12-212017-02-012017-03-132017-04-202017-05-302017-07-072017-08-152017-09-222017-10-312017-12-08−0.06−0.04−0.0200.020.040.06
Test setForecasted valuesResiduals forecasted by the GARCH(1,1) modelLog returns

The graph above shows the predicted and current values of the log returns between 2014 and 2017. The prediction is rather close to the true values. We obtain an error of:

RMSE=0.0263
This error is small and significantly better than the error obtained with the ARIMA(0,1,1) model.

Visualization

To better understand the accuracy of our model, we can re-express the log returns in terms of prices and compare the predicted trajectories and the true trajectory.

20082010201220142016050100150200250
Test setForecasted valuesPrices forecasted by the GARCH(1,1) modelPrix

It is possible to improve our model. Either by giving it more training samples or by using a GARCH that does not assume that the residuals are normally distributed because we have seen with the Pearson test that this is not the case.

2.4.2 Modified GARCH, an example: EGARCH

In this section, we will explore the EGARCH model which is a modification of the GARCH model. As mentioned above, GARCH is a model that was developed to reflect the properties of financial time series. EGARCH is a less restrictive model than GARCH, it does not assume that log returns are Gaussian and does not force the coefficients of the conditional variance ht to be positive (i.e. ht is asymmetric). This has the effect of incorporating the following stylized facts:

  • The distribution of financial data has thick tails

  • Negative shocks at t1 have a stronger impact at t than positive shocks

The model is written as follows:

Zt=htetet is IID(0,1) lnht=c+α1g(et1)+γ1lnht1g(et)=et+λ(|et|E|et|)

Where c, α1 are real and |γ1|<1 and et has a symmetric distribution in 0.

2014-05-292014-07-082014-08-142014-09-232014-10-302014-12-092015-01-202015-02-272015-04-082015-05-152015-06-242015-08-032015-09-102015-10-192015-11-252016-01-062016-02-162016-03-242016-05-032016-06-102016-07-202016-08-262016-10-052016-11-112016-12-212017-02-012017-03-132017-04-202017-05-302017-07-072017-08-152017-09-222017-10-312017-12-08−0.15−0.1−0.0500.050.10.15
Test setForecasted valuesResiduals forecasted by the EGARCH(1,1) modelLog returns

RMSE=0.0241

20082010201220142016050100150200
Test setForecasted valuesPrix prédits par le modèle EGARCH(1,1)Prix

This model improves the RMSE error. To go further, we can also try to change the mean of the GARCH model for a better fit. It is important to remember that in order to really benefit from GARCH models other than the standard GARCH, it is necessary to check whether the data with which we work verify the stylized facts of the financial data on which these models are based.

3 References

[1] Introduction to Time Series and Forecasting, Peter J. Brockwell, Richard A. Davis

[2] VLab NYU, Volatility Analysis, EGARCH, https://vlab.stern.nyu.edu/docs/volatility/EGARCH

[3] Medium, A complete introduction to time series analysis, https://medium.com/analytics-vidhya/a-complete-introduction-to-time-series-analysis-with-r-differencing-db94bc4df0ae

[4] Dr Bharatendra Rai, https://www.youtube.com/user/westlandindia